The Collective Data Mining : a Technology for Ubiquitous Data Analysis from Distributed Heterogeneous Sites

نویسندگان

  • Hillol Kargupta
  • Byung-Hoon Park
چکیده

This paper introduces the collective data mining (CDM), a unique approach to distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may face ambiguous situation and may lead to incorrect global data model. It also observes that any function can be expressed in a distributed fashion using a set of appropriate basis functions and orthonormal basis functions can be effectively used for developing a general framework for DDM that guarantees correct local analysis, resulting in correct global data model using minimal data communication. The paper develops the foundation of CDM, presents a case study for decision tree learning in CDM, and brieey describes BODHI, a CDM based experimental system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustered Collaborative Filtering Approach for Distributed Data Mining on Electronic Health Records

Distributed Data Mining (DDM) has become one of the promising areas of Data Mining. DDM techniques include classifier approach and agent-approach. Classifier approach plays a vital role in mining distributed data, having homogeneous and heterogeneous approaches depend on data sites. Homogeneous classifier approach involves ensemble learning, distributed association rule mining, meta-learning an...

متن کامل

Collective Data Mining: A New Perspective Toward Distributed Data Mining

This paper introduces the collective data mining (CDM), a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may face ambiguous situation and may lead to incorrect global data model. It also observes that any function can be expressed in a distributed fashion using a set of a...

متن کامل

Distributed Incremental Least Mean-Square for Parameter Estimation using Heterogeneous Adaptive Networks in Unreliable Measurements

Adaptive networks include a set of nodes with adaptation and learning abilities for modeling various types of self-organized and complex activities encountered in the real world. This paper presents the effect of heterogeneously distributed incremental LMS algorithm with ideal links on the quality of unknown parameter estimation. In heterogeneous adaptive networks, a fraction of the nodes, defi...

متن کامل

Collective Data Mining from Distributed, Vertically Partitioned Feature Space

This paper develops collective data mining, a unique approach for nding patterns from a network of databases, each with a distinct feature space. This paper addresses both distributed cooperative learning at the global level and also learning at the local data sites. In addition to developing the foundation of the collective data mining , it also presents BODHI, a distributed data mining (DDM) ...

متن کامل

Distributed Multivariate Regression Using Wavelet-Based Collective Data Mining

This paper presents a method for distributed multivariate regression using waveletbased Collective Data Mining (CDM). The method seamlessly blends machine learning and the theory of communication with the statistical methods employed in parametric multivariate regression to provide an effective data mining technique for use in a distributed data and computation environment. The technique is app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998